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Abstract This work attempts to give new theoreti¬ 
cal insights to the absence of intermediate stages in 
the evolution of language. In particular, it is developed 
an automata networks approach to a crucial question: 
how a population of language users can reach agree¬ 
ment on a linguistic convention? To describe the ap¬ 
pearance of sharp transitions in the self-organization of 
language, it is adopted an extremely simple model of 
(working) memory. At each time step, language users 
simply “loss” part of their word-memories. Through 
computer simulations of low-dimensional lattices, it ap¬ 
pear sharp transitions at critical values that depend on 
the size of the vicinities of the individuals. 

Keywords Automata networks • Linguistic Conven¬ 
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1 Introduction 

Contrarily to the extended view on language evolution 
[3117] which proposes a gradual transition (through suc¬ 
cessive stages) between a “protolanguage”, a modern 
language minus syntax, and modern languages, recent 
works have been suggested the absence of intermediate 
stages. For instance, in it is suggested the appear¬ 
ance of phase transitions (scaling relations close to the 
Zipf’s law) in the emergence of vocabularies under least 
effort constraints. 
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This work attempts to give new theoretical insights 
to the absence of intermediate stages in the evolution 
of language. The startpoint is to develop a mathemat¬ 
ical approach to a crucial question: how a population 
of language users can reach agreement on a linguistic 
convention? [iniiiiiEiiia. Surprisingly, language users 
collectively reach shared languages without any kind of 
central control or “telepathy” influencing the formation 
of language, and only from local conversations between 
few participants. The solution is based on two opposite 
alignment preferences, which guide the behavior of lan¬ 
guage users by the selection of the words that give the 
highest chance of communicative success and the re¬ 
moval of the words that imply failures during commu¬ 
nication [12] . These procedures can be understand as 
part of the lateral inhibition strategy m- If each con¬ 
vention is associated to a score measuring its amount 
of success, the score will decrease in the case of unsuc¬ 
cessful communicative interactions, and the convention 
will be less used. In consequence, the outcome of align¬ 
ment strategies is the self-organization of agreement: 
the successful words will be more common, the individ¬ 
uals will align their own languages and there will be an 
increasing of the chance of successful interactions. 

To describe the appearance of sharp transitions in 
language formation, it is adopted an extremely simple 
model of (working) memory [1], understood as a tem¬ 
poral finite memory involved in on-line tasks and, spe¬ 
cially, in language production and comprehension. At 
each time step, language users simply “loss” part of 
their word-memories. What is more, it is hypothesized 
that the features of language (in particular, the con¬ 
sensus on a linguistic convention) emerge drastically at 
some critical memory loss capacity [B]. 

The view of this work is based on a automata net¬ 
works model mM- Automata networks are attractive 
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models for systems that exhibit self-organization. From 
extreme simplified rules of local interactions inspired in 
real phenomena, automata networks exhibit astonish¬ 
ingly rich patterns of behavior. The essential feature of 
the adopted framework is locality: only from local com¬ 
municative interactions, it will be described the emer¬ 
gence of complex language patterns. 

The work proceeds by introducing basic definitions 
and the rules of the automata (Section 2). This is fol¬ 
lowed in Sections 3 and 4 by experiments based on 
an energy function that measures the amount of local 
agreement between individuals. Finally, a brief discus¬ 
sion about sharp transitions in language formation is 
presented. 

2 The model 

2.1 Basic notions 

Let 0 = {P,T) be a connected and undirected graph 
with vertex set P = {1,..., n} and edge set I. The set P 
represents the finite population of individuals, whereas 
I is the set of possible interactions between individuals. 
A crucial element of the model is that the interactions 
(defined by I) are local. The individual u G P partici¬ 
pates in communicative interactions only with “close” 
neighbors. To measure the degree of “closeness” a pa¬ 
rameter r (the radius) is introduced. The neighborhood 
of radius r of w is the set FJ = {n G P : 0 < d(u, v) < 
r}, where d is the usual distance on Q (the length of 
the shortest path between two vertices). Thus, commu¬ 
nicative interactions occur between an individual u G P 
and its associated set of neighbors located on IP. 

VF is a finite set of words. Each individual u G P 
is characterized by its state pair (Mu,Xu), where 
is the memory to store words, and Xu is a word of 
that u conveys to the neighbors in IP. In this con¬ 
text, within a communicative interaction (a vertex and 
its neighbors) the “central” vertex plays the role of 
“hearer”, the neighbors play the role of “speaker”. In¬ 
deed, the central vertex receives the words conveyed by 
its neighbors. This set of conveyed words is called Wu, 
for u G P. Some conveyed words are known and some 
of these words are unknown by the central vertex. Two 
sets are defined: Bu = {xy G Wu : Xy G M„}, the set of 
known words, and Ny = {xy G Wy '■ Xy ^ M„}, the set 
of unknown words. 

2.2 Automata networks 

On Q, the naming automata is defined as the touple 
-4 = {G, Q, ifu-uG P),4>), where 


— Q is the set of all possible states V{W) x IP (P 
denotes the set of subsets of IP). So, the state asso¬ 
ciated to the vertex u G P, {Mu,Xu), is an element 
of Q {{Mu, Xu) G Q). 

— {fu : u G P) is the set of local rules. The naming au¬ 
tomata A is uniform, that is, each cell is associated 
to the same local rule. This rule takes as inputs the 
set IPu (in particular, P„ and Nu) and it gives as 
output the new state of the vertex u. 

— (j) is a, function, the updating scheme, that defines 
the order in which the vertices are updated. Tradi¬ 
tionally, automata networks supposes the existence 
of a global “clock” that establishes that all cells 
are updated at the same time. In this work, a fully 
asynchronous scheme is considered. This updating 
scheme implies that at each time step one vertex is 
selected uniformly at random. The purpose of con¬ 
sider a fully asynchronous scheme arises from the 
typical updating order of the Naming Game. At this 
model, at each time step two vertices (speaker and 
hearer) are choosen at random. 

The configuration X{t) at time step t is the family 
{{Mu,Xu)}ugp- The vertex u G P is choosen according 
to the fully asynchronous scheme. The configuration at 
step t + 1, A(t + 1), is obtained by updating through the 
local rule /„ the state of the vertex u. A configuration 
X' is a fixed point of the dynamics if X'{t) = X'{t + 1) 
for any vertex update. 


2.3 Local rules 

The local rules {fu : u G P) are based on the concept 
of alignment. Suppose that at time step t the vertex u 
has been selected, u and its neighbors in IP define a 
communicative interaction, in which the vertex u plays 
the role of “hearer”, the neighbors of IP plays the role 
of “speaker”. The vertex u faces with two possible ac¬ 
tions: (1) Mu is updated by adding the words of Ny 
(addition action (A)) in order to increase the chance 
of future successful interactions; or (2) My is updated 
by defect the words (collapse action (C)) that do not 
participate of successful interactions. 

To measure the amount of memory loss, a third ac¬ 
tion, forgetfulness (F), is introduced. Let p G [0,1] 
be a parameter. In simple terms, to the extent that 
p increases, the amount of memory loss increases. Py 
is the subset of My \ {a:„}0 formed by [p(|M„| — 1)J 
words (selected at random without replacement from 
My \ {xy}), where [p(|M„| — 1)J means the largest in- 

^ Mu \ {xu} denotes the set My without the element Xy 
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Fig. 1 Example of forgetfulness (F) and addition (A) 
actions. W = {1, 2, 3, 4} is the set of words. Suppose that at 
some time step the vertex u has been choosen. Three speakers 
in Vu participate of the interaction. It is assumed that e = 0.5. 
In the first row (F), u “forgets” the word “1”, then ({2}, 2) is 
updated to ({2}, 2). In the second row (A), ({2}, 2) is updated 
to ({2,3,4}, 2). 


teger lower than p(|M„| — 1). Then, the family of local 
rules reads 


(F) {Mu \ Pu,Xu) 

(A) (M„U7V„,a;„) (1) 

[ if0 = A„, (C) ({min(B„)},min(B„)) 

In other words, in the case that 0 ^ Nu the local 
rule acts following two steps, first, by the forgetfulness 
action and, second, by the addition action (along these 
two steps the set Wi do not change) (see Fig. 1). 

In this paper, a particular collapse action is consid¬ 
ered. Suppose that each agent is endowed with an inter¬ 
nal total order for the set of words (equivalently, if we 
consider W Q Z then the agents are endowed with the 
order <). Every agent chooses to collapse in the min¬ 
imum word presented in the neighborhood. This rule 
represents, for example, the situation that the words 
differ according to their degree of relevance related to 
linguistic contexts m- 


fu = 


if 0 ^ iV„ 


3 Methods 

To explicitly describe the amount of local agreement 
between individuals, a function, called the “energy”, is 
defined (for a similar function, see [5]). This energy- 
based approach arises from a physicist interpretation. 
The energy measures the amount of local unstability 
of the configuration. Large values of energy imply that 
the system evolves until reach ordered configurations. 
At each neighborhood Vf, it is defined the function 


S(xu,Xu), V € Vu, which is 1 in the case that Xu = Xy 
{agreement between the vertices u and v), and 0 other¬ 
wise {disagreement). Thus, it is measured the amount of 
local agreement of the neighborhood ^(xu,Xv); 

summing this quantity over all vertices defines the total 
energy of the conhguration at that time: 


m 


1 

n 


E 

«eP 



y} S{xu,Xy) 
vev; 


( 2 ) 


The function E{t) is bounded by two extreme agree¬ 
ment cases: E{t) = 0 if all individuals convey the same 
word (global agreement); E{t) = — 1 if each individual 
conveys a different word. The global agreement case 
coincides with the final absorbing state of the Naming 
Game, where there is one unique shared word. 

The analysis is focused on a two-dimensional peri¬ 
odic lattice of size n = 128^ = 16384 with Von Neu¬ 
mann neighborhood. The final value E{t) (after 200n 
time steps or until reach E{t) = —1) is described for 
several values of p and r: p varies from 0 to 1 with an 
increment of 10%, and r = {1,2, 3,4} (respectively, 4, 
12, 24 and 40 neighbors). In general, a Von Neumann 
neighborhood of radius r supposes 2r{r -|- 1) neighbors. 
Even though the radius r = 4 supposes 40 neighbors, 
there is no a loss of locality. Indeed, ^ ~ 2% of the 
population of individuals. 


4 Sharp transitions on two dimensional lattices 

Several aspects are remarkable in the behavior oi Ef 
versus p, as shown in Eigure 2 (top). Eor r = 1 , the dy¬ 
namics reaches the configuration of global agreement 
{E{t) = —1), p < 1. For r = 2 , 3 , 4 , the behavior of Ef 
versus p exhibits three clear domains. First, Ef reaches 
the minimum —1 for p < p'^. This value depends on r: 
pI ~ 0 . 43 , Py « 0 . 25 , Py ~ 0 . 17 . In general, it is no¬ 
ticed that an increasing in the radius r > 1 implies a 
decreasing in the critical parameter Second, a dras¬ 
tic change is found at p = Pc- The dynamics losses the 
convergence to the global minimum Ef = —1. Finally, 
for p > Pc the dynamics seems to reach a stationary 
value Ef > —1 which increases to the extent r grows. 

Standard deviation a of the data versus p, as shown 
in Figure 2(bottom), confirms the previous observa¬ 
tions. Three aspects are remarkable. First, a takes small 
values in all cases. Second, for p” < Pc, r > 1, the stan¬ 
dard deviation is close to 0. Third, it is observed a peak 
in cr close to the critical parameters p{, r > 1. The val¬ 
ues of these peaks strongly depend on the radius r: the 
more r increases, the more the associated peak grows. 
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Fig. 2 Ef versus p on two dimensional lattices, (top) On 

a two dimensional lattice of size n = 128^, it is showed the 
final value of the energy function, Ef, versus the parameter p, 
after 200n steps or E{t) until reach the global minimum —1. 
Averages over 100 initial conditions (r = 1,2) and 10 initial 
conditions (r = 3,4), with |TV| = n. First, r e {1,2,3,4} 
(respectively, 4, 12, 24 and 40 neighbors) and p is varied from 
0 to 1 with an increment of 10%. With these parameters, p^ S 
(0.4, 0.5), p® G (0.2, 0.3) and p^ G (0.1, 0.2). New simulations 
run over the previous critical zones. At each critical zone, p is 
varied with an increment of 10%. The critical parameter 
r > 1, is then defined as the lower value of p in the critical zone 
so that the energy function clearly does not converge to the 
global minimum E(t) = —1. (bottom) Standard deviation of 
the data for different values of r. 

5 Discussion 

The sudden changes observed on two dimensional lat¬ 
tices and the peaks in standard deviation, as shown in 
Figure 2, and the presence of power laws (Figure 3) 
suggest the appearance of sharp (phase) transitions at 
p = Pc for r > 1 [H]. As it was noticed, r = 4 exhibits 
the most drastic sharp transition. At the different crit¬ 
ical forgetfulness parameters, scaling relations appear: 
low words (the first-ranked ones) are associated to mul- 



k 


Fig. 3 P{k) versus k on two dimensional lattices, for the 
critical valnes p{ of r = 2 (p^ = 0.43), r = 3 (p® = 0.25) 
and r = 4 (p^ = 0.17). On a two dimensional lattice of size 
n = 128 X 128, after 200A time steps it is exhibited the global 
distribution of the number of agents showing the fc—ranked 
word, P{k), versus k (log — log plot). Averages over the same 
initial conditions of the Figure 2. 

tiple individuals, whereas several words are related to 
one-to-one individual-word associations. Despite of the 
appearance of similar slopes for r > 1, the scaling rela¬ 
tions for r = 4 differs from r = 2, 3, as shown in Fig. 3. 
More precisely, for r = 4 the frecuency-rank k is asso¬ 
ciated to small frequencies in comparison with r — 2,3. 

The simple approach of this paper to the individ¬ 
ual’s forgetfulness introduces a novel framework to study 
the influence of minimal cognitive mechanisms on the 
formation and evolution of languages. 

Future work could involve the study of the dynam¬ 
ics on general topologies (for instance, random graphs), 
more complex cognitive mechanisms of memory capac¬ 
ities, or the influence of large radius r (for instance, 
r ^ y/n) on the appearance of sharp transitions. 
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